Institutional Repository Keyword Analysis with Web Crawler
نویسندگان
چکیده
This study aims at investigating procedures of semantic and linguistic extraction keywords from metadata documents indexed in the Institutional Repository Unesp. For that purpose, a web crawler was developed, collected 325.181 authors, all fields knowledge, February 28th, 2013 to November 10th, 2021. The preparation collection, analysis environment used Python programming language, composed three program libraries: library requests, which allows manipulation hyperlinks webpages visited through crawler; BeautifulSoup library, extract HTML data webpage analysis; Pandas has an open code (free software) stands for providing tools high performance analysis. final listing consisted 273,485 keywords, represents 15.9% initially collected. Results indicated most recurring problem duplication with 51,696 duplicated representing indicators inconsistencies search documents. It is concluded refinement assigned by authors eliminates incorporation set symbols do not represent authors’ same spelling, but upper/lower case variations or lexical indexing different
منابع مشابه
Towards a Keyword-Focused Web Crawler
This paper concerns predicting the content of textual web documents based on features extracted from web pages that link to them. It may be applied in an intelligent, keyword-focused web crawler. The experiments made on publicly available real data obtained from Open Directory Project with the use of several classification models are promising and indicate potential usefulness of the studied ap...
متن کاملInstitutional Repository
26 Background: Distance from home to school is an important influence on the decision 27 to use active transport (AT); however, ecological perspectives would suggest this relationship 28 may be moderated by individual, interpersonal, and environmental factors. This study 29 investigates whether (i) gender, (ii) biological maturation, (iii) perceived family support for 30 physical activity (PA),...
متن کاملInstitutional Repository
The global burden of foodborne disease due to the presence of contaminating microorganisms remains high, despite some notable examples of their successful reduction in some instances. Globally, the number of species of microorganisms responsible for foodborne diseases has increased over the past decades and as a result of the continued centralization of the food processing industry, outbreaks n...
متن کاملInstitutional Culture as Keyword
Institutional culture has become a buzzword in recent discussions of higher education in South Africa. Indeed, as references to it proliferate, there is a growing sense that institutional culture may well be the key to the successful transformation of higher education in South Africa. Or – to frame the matter as forcefully as do many recent analysts – it is simply the massive fact and bulk of i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Central European journal of educational research
سال: 2022
ISSN: ['2677-0326']
DOI: https://doi.org/10.37441/cejer/2022/4/2/11395